Speed-up Iterative Frequent Itemset Mining with Constraint Changes

نویسندگان

  • Gao Cong
  • Bing Liu
چکیده

Mining of frequent itemsets is a fundamental data mining task. Past research has proposed many efficient algorithms for the purpose. Recent work also highlighted the importance of using constraints to focus the mining process to mine only those relevant itemsets. In practice, data mining is often an interactive and iterative process. The user typically changes constraints and runs the mining algorithm many times before satisfied with the final results. This interactive process is very time consuming. Existing mining algorithms are unable to take advantage of this iterative process to use previous mining results to speed up the current mining process. This results in enormous waste in time and in computation. In this paper, we propose an efficient technique to utilize previous mining results to improve the efficiency of current mining when constraints are changed. We first introduce the concept of tree boundary to summarize the useful information available from previous mining. We then show that the tree boundary provides an effective and efficient framework for the new mining. The proposed technique has been implemented in the contexts of two existing frequent itemset mining algorithms, FP-tree and Tree Projection. Experiment results on both synthetic and reallife datasets show that the proposed approach achieves dramatic saving in computation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

Accelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting

Frequent Itemset Mining (FIM) is one of the most investigated fields of data mining. The goal of Frequent Itemset Mining (FIM) is to find the most frequently-occurring subsets from the transactions within a database. Many methods have been proposed to solve this problem, and the Apriori algorithm is one of the best known methods for frequent Itemset mining (FIM) in a transactional database. In ...

متن کامل

Users Constraints in Itemset Mining

Discovering significant itemsets is one of the fundamental tasks in data mining. It has recently been shown that constraint programming is a flexible way to tackle data mining tasks. With a constraint programming approach, we can easily express and efficiently answer queries with user’s constraints on itemsets. However, in many practical cases queries also involve user’s constraints on the data...

متن کامل

Mining itemset utilities from transaction databases

The rationale behind mining frequent itemsets is that only itemsets with high frequency are of interest to users. However, the practical usefulness of frequent itemsets is limited by the significance of the discovered itemsets. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. In this paper, we propose a u...

متن کامل

Frequent Itemset Mining Using Rough-Sets

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002